Ratio Rules: A New Paradigm for Fast, Quantifiable Data Mining
نویسندگان
چکیده
Association Rule Mining algorithms operate on a data matrix (e.g., customers products) to derive association rules [2, 23]. We propose a new paradigm, namely, Ratio Rules, which are quanti able in that we can measure the \goodness" of a set of discovered rules. We propose to use the \guessing error" as a measure of the \goodness", that is, the rootmean-square error of the reconstructed values of the cells of the given matrix, when we pretend that they are unknown. Another contribution is a novel method to guess missing/hidden values from the Ratio Rules that our method derives. For example, if somebody bought $10 of milk and $3 of bread, our rules can \guess" the amount spent on, say, butter. Thus, we can perform a variety of important tasks such as forecasting, answering \what-if" scenarios, detecting outliers, and visualizing the data. Moreover, we show how to compute Ratio Rules in a single pass over the dataset with small memory requirements (a few small matrices), in contrast to traditional association rule mining methods that require multiple passes and/or large memory. ExperWork performed while at the University of Maryland. This research was partially funded by the Institute for Systems Research (ISR), and by the National Science Foundation under Grants No. EEC-94-02384, IRI-9205273 and IRI-9625428. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 24th VLDB Conference New York, USA, 1998 iments on several real datasets (e.g., basketball and baseball statistics, biological data) demonstrate that the proposed method consistently achieves a \guessing error" of up to 5 times less than the straightforward competitor.
منابع مشابه
Learning quantifiable associations via principal sparse non-negative matrix factorization
Association rules are traditionally designed to capture statistical relationship among itemsets in a given database. To additionally capture the quantitative association knowledge, Korn et.al. recently propose a paradigm named Ratio Rules [6] for quantifiable data mining. However, their approach is mainly based on Principle Component Analysis (PCA), and as a result, it cannot guarantee that the...
متن کاملA new chart-independent method for fast identification of control level of industrial processes using continuous data
A new method is developed for a fast identification of the stability situation of industrial processes. The proposed method includes two factor ratios of the control constants for the upper and lower control limits to process these constants. An indication ratio is then defined as the ratio of the maximum data range value to the difference between the maximum and average values for individual d...
متن کاملRatio Rule Mining from Multiple Data Sources
Both multiple source data mining and streaming data mining problems have attracted much attention in the past decade. In contrast to traditional association-rule mining, to capture the quantitative association knowledge, a new paradigm called Ratio Rule (RR) was proposed recently. We extend this framework to mining ratio rules from multiple source data streams which is a novel and challenging p...
متن کاملA new approach based on data envelopment analysis with double frontiers for ranking the discovered rules from data mining
Data envelopment analysis (DEA) is a relatively new data oriented approach to evaluate performance of a set of peer entities called decision-making units (DMUs) that convert multiple inputs into multiple outputs. Within a relative limited period, DEA has been converted into a strong quantitative and analytical tool to measure and evaluate performance. In an article written by Toloo et al. (2009...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کامل